stochastic complexity
Algebraic Information Geometry for Learning Machines with Singularities
Algebraic geometry is essential to learning theory. In hierarchical learning machines such as layered neural networks and gaussian mixtures, the asymptotic normality does not hold, since Fisher in(cid:173) formation matrices are singular. In this paper, the rigorous asymp(cid:173) totic form of the stochastic complexity is clarified based on resolu(cid:173) tion of singularities and two different problems are studied. It is useful for model selection, but not for generalization.
Decomposable Families of Itemsets
Tatti, Nikolaj, Heikinheimo, Hannes
The problem of selecting a small, yet high quality subset of patterns from a larger collection of itemsets has recently attracted lot of research. Here we discuss an approach to this problem using the notion of decomposable families of itemsets. Such itemset families define a probabilistic model for the data from which the original collection of itemsets has been derived from. Furthermore, they induce a special tree structure, called a junction tree, familiar from the theory of Markov Random Fields. The method has several advantages. The junction trees provide an intuitive representation of the mining results. From the computational point of view, the model provides leverage for problems that could be intractable using the entire collection of itemsets. We provide an efficient algorithm to build decomposable itemset families, and give an application example with frequency bound querying using the model. Empirical results show that our algorithm yields high quality results.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Finland > Uusimaa > Helsinki (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
- Research Report > New Finding (0.66)
- Instructional Material > Course Syllabus & Notes (0.47)
Testing Conditional Independence on Discrete Data using Stochastic Complexity
Marx, Alexander, Vreeken, Jilles
Testing for conditional independence is a core aspect of constraint-based causal discovery. Although commonly used tests are perfect in theory, they often fail to reject independence in practice, especially when conditioning on multiple variables. We focus on discrete data and propose a new test based on the notion of algorithmic independence that we instantiate using stochastic complexity. Amongst others, we show that our proposed test, SCI, is an asymptotically unbiased as well as $L_2$ consistent estimator for conditional mutual information (CMI). Further, we show that SCI can be reformulated to find a sensible threshold for CMI that works well on limited samples. Empirical evaluation shows that SCI has a lower type II error than commonly used tests. As a result, we obtain a higher recall when we use SCI in causal discovery algorithms, without compromising the precision.
- Europe > Germany > Saarland > Saarbrücken (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
The Stochastic Complexity of Principal Component Analysis
PCA (principal component analysis) and its variants are ubiquitous techniques for matrix dimension reduction and reduced-dimension latent-factor extraction. For an arbitrary matrix, they cannot, on their own, determine the size of the reduced dimension, but rather must be given this as an input. NML (normalized maximum likelihood) is a universal implementation of the Minimal Description Length principle, which gives an objective compression-based criterion for model selection. This work applies NML to PCA. A direct attempt to do so would involve the distributions of singular values of random matrices, which is difficult. A reduction to linear regression with a noisy unitary covariate matrix, however, allows finding closed-form bounds on the NML of PCA.
- North America > United States (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Causal Discovery by Telling Apart Parents and Children
Marx, Alexander, Vreeken, Jilles
We consider the problem of inferring the directed, causal graph from observational data, assuming no hidden confounders. We take an information theoretic approach, and make three main contributions. First, we show how through algorithmic information theory we can obtain SCI, a highly robust, effective and computationally efficient test for conditional independence---and show it outperforms the state of the art when applied in constraint-based inference methods such as stable PC. Second, building upon on SCI, we show how to tell apart the parents and children of a given node based on the algorithmic Markov condition. We give the Climb algorithm to efficiently discover the directed, causal Markov blanket---and show it is at least as accurate as inferring the global network, while being much more efficient. Last, but not least, we detail how we can use the Climb score to direct those edges that state of the art causal discovery algorithms based on PC or GES leave undirected---and show this improves their precision, recall and F1 scores by up to 20%.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Saarland > Saarbrücken (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Acknowledgements
Finally, there will also need to be a team of behavioral practitioners responsible for designing the organizational-computer collaboration features such as direct manipulation metaphors, tutoring capabilities, group productivity enhancement techniques and knowledge sharing/reuse concepts. But just hiring such team members alone will not assure the future of the FSKBS field. There are major gaps in our understanding the theory of ontologies, knowledge servers, collaboration, and knowledge reuse approaches. Researchers need not only to pioneer new theories for these and related FSKBS topics, but also to address the scale-up issues associated with applying the resulting techniques to large-scale KB problems. Researchers also need to evaluate current FSKBS applications to extract lessons learned and to compile design-related insights.
The Stochastic complexity of spin models: How simple are simple spin models?
Beretta, Alberto, Battistin, Claudia, de Mulatier, Clélia, Mastromatteo, Iacopo, Marsili, Matteo
The Stochastic complexity of spin models: How simple are simple spin models? Alberto Beretta, 1 Claudia Battistin, 2 Cl elia de Mulatier, 1 Iacopo Mastromatteo, 3 and Matteo Marsili 1 1 The Abdus Salam International Centre for Theoretical Physics (ICTP), Strada Costiera 11, I-34014 Trieste, Italy 2 Kavli Institute for Systems Neuroscience and Centre for Neural Computation, Olav Kyrres gate 9, 7030 Trondheim, Norway 3 Capital Fund Management, 23 rue de l'Universit e, 75007 Paris, France Simple models, in information theoretic terms, are those with a small stochastic complexity. We study the stochastic complexity of spin models with interactions of arbitrary order. Invariance with respect to bijections within the space of operators allows us to classify models in complexity classes. This invariance also shows that simplicity is not related to the order of the interactions, but rather to their mutual arrangement.
- Europe > Norway > Central Norway > Trøndelag > Trondheim (0.24)
- Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.24)
- Europe > France > Île-de-France > Paris > Paris (0.24)
- (3 more...)
Stochastic complexity of Bayesian networks
Yamazaki, Keisuke, Watanbe, Sumio
Bayesian networks are now being used in enormous fields, for example, diagnosis of a system, data mining, clustering and so on. In spite of their wide range of applications, the statistical properties have not yet been clarified, because the models are nonidentifiable and non-regular. In a Bayesian network, the set of its parameter for a smaller model is an analytic set with singularities in the space of large ones. Because of these singularities, the Fisher information matrices are not positive definite. In other words, the mathematical foundation for learning was not constructed. In recent years, however, we have developed a method to analyze non-regular models using algebraic geometry. This method revealed the relation between the models singularities and its statistical properties. In this paper, applying this method to Bayesian networks with latent variables, we clarify the order of the stochastic complexities.Our result claims that the upper bound of those is smaller than the dimension of the parameter space. This means that the Bayesian generalization error is also far smaller than that of regular model, and that Schwarzs model selection criterion BIC needs to be improved for Bayesian networks.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
- Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.76)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.76)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Algebraic Information Geometry for Learning Machines with Singularities
Algebraic geometry is essential to learning theory. In hierarchical learning machines such as layered neural networks and gaussian mixtures, the asymptotic normality does not hold, since Fisher information matricesare singular. In this paper, the rigorous asymptotic form of the stochastic complexity is clarified based on resolution of singularities and two different problems are studied.
- Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)